import pandas as pd
import numpy as np
import plotly.express as px
import plotly.express as px
import plotly.io as pioBusiness Running Case: Evaluating Personal Job Market Prospects in 2024
Project Phase III
import plotly.io as pio
# Set global theme for all Plotly plots
pio.templates.default = "plotly_white"df = pd.read_csv("lightcast_job_postings.csv")/tmp/ipykernel_18149/3047231268.py:1: DtypeWarning: Columns (19,30) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv("lightcast_job_postings.csv")
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
# df.head()#df.columns.tolist()Creating a Skill Level DataFrame
import pandas as pd
skills_data = {
"Name": ["Andrey", "Moiz", "Jason", "Prabu","Jitvan"],
"Python (Programming Language)": [5, 3, 4, 4, 2],
"SQL (Programming Language)": [4, 3, 5, 3, 5],
"Microsoft Excel": [3, 5, 4, 4, 4],
"Power BI": [2, 4, 3, 3, 5],
"Tableau": [3, 4, 3, 4, 3]
}
df_skills = pd.DataFrame(skills_data)
df_skills.set_index("Name", inplace=True)
df_skills| Python (Programming Language) | SQL (Programming Language) | Microsoft Excel | Power BI | Tableau | |
|---|---|---|---|---|---|
| Name | |||||
| Andrey | 5 | 4 | 3 | 2 | 3 |
| Moiz | 3 | 3 | 5 | 4 | 4 |
| Jason | 4 | 5 | 4 | 3 | 3 |
| Prabu | 4 | 3 | 4 | 3 | 4 |
| Jitvan | 2 | 5 | 4 | 5 | 3 |
Team Skill Insights
From the matrix above, we observe:
- Jason shows consistently strong proficiency across all listed tools, especially in SQL and Excel.
- Jitvan excels in Power BI and SQL, but may benefit from further development in Python.
- Moiz demonstrates advanced proficiency in Excel and Tableau, indicating strong visualization and reporting capabilities.
- Andrey is an expert in Python but may require further training in Power BI.
- Prabu maintains intermediate-to-advanced competency across all tools, making him a well-rounded contributor.
Visualizing Skill Gaps with Seaborn
import seaborn as sns
import matplotlib.pyplot as plt
# Create a mapping dictionary for display labels only
column_display_names = {
"Python (Programming Language)": "Python",
"SQL (Programming Language)": "SQL",
"Microsoft Excel": "Excel",
"Power BI": "Power BI",
"Tableau": "Tableau"
}
plt.figure(figsize=(10, 6))
ax = sns.heatmap(
df_skills,
annot=True,
cmap="coolwarm",
linewidths=0.5,
linecolor='white',
cbar=True,
fmt='g'
)
# Set cleaned display names just for x-axis
ax.set_xticklabels([column_display_names.get(label, label) for label in df_skills.columns],
rotation=0, ha='center', fontsize=10)
plt.title("Team Skill Levels Heatmap", fontsize=14)
plt.yticks(rotation=0, fontsize=10)
plt.tight_layout()
# Save the figure
plt.savefig("plots.qmd/Team_Skill_Levels_Heatmap.png", dpi=300, bbox_inches='tight')
plt.show()📊 Team Skill Levels – Heatmap Analysis
The heatmap above offers a visual overview of our team’s self-assessed proficiency across five core analytical tools: Python, SQL, Excel, Power BI, and Tableau. The color intensity reflects the skill level, with darker shades representing higher proficiency (scale: 1 = Beginner, 5 = Expert).
🔍 Key Insights:
- Python: Andrey stands out with expert-level skills (5), while Jitvan shows the lowest proficiency (2), indicating a potential area for development.
- SQL: Jason and Jitvan both score the highest (5), reflecting strong database querying capabilities, while others maintain intermediate competency.
- Excel: Moiz and Jason demonstrate the highest proficiency (5), useful for data wrangling and reporting.
- Power BI: Jitvan leads with a top score (5), suggesting strong data visualization capabilities; however, Andrey scores the lowest (2), indicating a potential gap.
- Tableau: All team members cluster around the intermediate level (3–4), showing consistent yet improvable skills across the board.
This heatmap complements the earlier skill matrix by allowing a quick comparative glance, which is particularly helpful in identifying areas of individual strength and skill gaps that may benefit from team-wide upskilling initiatives.
Compare team skills to industry requirements
TOP 5 IN DEMAND SOFTWARE SKILLS IN IT
import pandas as pd
import ast
from collections import Counter
import matplotlib.pyplot as plt
# Define the relevant skill columns
skill_columns = ["SOFTWARE_SKILLS_NAME"]
# Function to safely parse stringified lists
def extract_skills(row):
skills = []
for col in skill_columns:
if pd.notna(row.get(col)):
try:
skills += ast.literal_eval(row[col])
except Exception:
continue
return skills
# Apply function across rows
all_skills = df.apply(extract_skills, axis=1).explode().dropna()
# Count frequency of each skill
top_skills = Counter(all_skills).most_common(5)
top_skills_df = pd.DataFrame(top_skills, columns=["Skill", "Frequency"])# Save to CSV
top_skills_df.to_csv("top_skills.csv", index=False)
# Create Plotly bar chart with salmon color
fig = px.bar(
top_skills_df,
x="Frequency",
y="Skill",
orientation='h',
title="Top 5 In-Demand IT Skills",
color_discrete_sequence=["salmon"] # global color theme
)
# Update layout for consistency
fig.update_layout(
title_font_size=18,
title_font_family="Arial",
plot_bgcolor="white",
paper_bgcolor="white",
font=dict(size=12),
margin=dict(t=60, l=50, r=30, b=50),
)
# Display chart
fig.show()🚀 Top 5 In-Demand Software Skills in IT (Industry-Level)
The bar chart above highlights the top five most frequently requested software-related skills in IT job postings, based on the extracted data from the SOFTWARE_SKILLS_NAME column in the Lightcast dataset.
🔍 Observations:
- SQL (Programming Language) ranks as the most in-demand skill across job postings, signaling its foundational importance in data querying and backend systems.
- Microsoft Excel continues to be widely valued, especially in roles that demand reporting, data cleaning, and spreadsheet-based operations.
- Python (Programming Language), a critical skill for automation, data analysis, and machine learning, ranks third — affirming its strong market relevance.
- SAP Applications, often used in enterprise resource planning (ERP) and supply chain systems, indicates a demand for professionals with domain-specific technical expertise.
- Dashboard skills, representing tools like Tableau, Power BI, and similar platforms, round out the top five — showcasing the need for data visualization and communication capabilities.
These findings serve as a benchmark for evaluating whether our team’s skill sets align with current industry expectations — and will help guide upskilling priorities in the subsequent gap analysis.
# Step 1: Load the top skills list
top_skill_names = top_skills_df["Skill"].tolist()
# Step 2: Manually map top skills to df_skills columns
skill_alias = {
"SQL (Programming Language)": "SQL (Programming Language)",
"Python (Programming Language)": "Python (Programming Language)",
"Microsoft Excel": "Microsoft Excel",
"Power BI": "Power BI",
"Tableau (Business Intelligence Software)": "Tableau",
"Dashboard": "Tableau",
"SAP Applications": "SAP Applications"
}
# Step 3: Map all top skills (no filtering yet)
mapped_top_skills = [skill_alias.get(skill, skill) for skill in top_skill_names]
# Step 4: Ensure all mapped columns exist in df_skills
for skill in mapped_top_skills:
if skill not in df_skills.columns:
df_skills[skill] = 0
# Step 5: Now align after adding all
df_skills_aligned = df_skills[mapped_top_skills]
# Step 5: Align df_skills with the mapped top skills
df_skills_aligned = df_skills[mapped_top_skills]
# Shorten long x-axis tick labels
shortened_labels = {
"SQL (Programming Language)": "SQL",
"Python (Programming Language)": "Python",
"Microsoft Excel": "Excel",
"SAP Applications": "SAP",
"Tableau": "Tableau"
}
df_skills_aligned = df_skills_aligned.rename(columns=shortened_labels)
plt.figure(figsize=(12, 5))
sns.heatmap(df_skills_aligned, annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Team Skill Levels vs Top 5 In-Demand IT Skills (Mapped)")
plt.xticks(rotation=0, ha='center')
plt.tight_layout()
plt.show()🧩 Team Skills vs. Top 5 In-Demand IT Skills (Mapped Heatmap)
This heatmap visualizes the alignment between our team’s current skill levels and the top 5 most frequently requested IT skills from job postings: SQL, Excel, Python, SAP, and Tableau.
🔍 Key Observations:
- SAP Applications show a complete skills gap across the team — no member currently reports proficiency, indicating a strong potential area for upskilling based on industry demand.
- SQL and Excel are relatively well-covered, with multiple team members (e.g., Jason, Jitvan, Moiz) scoring between 4–5, reflecting strong database and spreadsheet competencies.
- Python remains an essential and in-demand skill. While most team members show intermediate-to-high proficiency, one member (Jitvan) has a lower score of 2, indicating a learning opportunity.
- Tableau, frequently requested under the category “Dashboard”, is evenly represented, with all members scoring between 3 and 4 — suggesting a consistent but improvable baseline.
📌 Takeaway:
This mapped heatmap offers a focused lens on how well our collective capabilities align with current market expectations. Notably, SAP stands out as an urgent area for learning, while ongoing refinement in Tableau and Python can further improve the team’s employability and project readiness.
TOP 5 IN DEMAND SPECIALIZED SKILLS IN IT
# Define the relevant skill columns
skill_columns = ["SPECIALIZED_SKILLS_NAME"]
# Function to safely parse stringified lists
def extract_skills(row):
skills = []
for col in skill_columns:
if pd.notna(row.get(col)):
try:
skills += ast.literal_eval(row[col])
except Exception:
continue
return skills
# Apply function across rows
all_skills = df.apply(extract_skills, axis=1).explode().dropna()
# Count frequency of each skill
top_skills = Counter(all_skills).most_common(5)
top_skills_df = pd.DataFrame(top_skills, columns=["Skill", "Frequency"])
# Optional: Save to CSV
top_skills_df.to_csv("top_skills.csv", index=False)
fig = px.bar(
top_skills_df,
x="Frequency",
y="Skill",
orientation='h',
title="Top In-Demand SPECIALIZED SKILLS In IT",
color_discrete_sequence=["salmon"]
)
# Update layout to match the global theme
fig.update_layout(
plot_bgcolor='white',
paper_bgcolor='white',
font=dict(size=14, family="Arial"),
title_font=dict(size=18, family="Arial", color="black"),
margin=dict(t=50, l=100, r=30, b=50),
showlegend=False,
xaxis=dict(title='Frequency', gridcolor='lightgrey'),
yaxis=dict(title='Skill', gridcolor='lightgrey')
)
fig.show()🛠️ Top In-Demand Specialized Skills in IT
The horizontal bar chart above identifies the five most sought-after specialized skills extracted from the SPECIALIZED_SKILLS_NAME column within the Lightcast job postings dataset. These reflect deeper domain-specific capabilities expected by employers across the IT sector.
🔍 Observations:
- Data Analysis emerged as the most frequently mentioned specialized skill, underlining its critical role in transforming raw information into actionable insights across industries.
- SQL (Programming Language) maintains its strong presence, reinforcing its dual role as both a software and specialized technical skill essential for data management and querying.
- Computer Science appears prominently, indicating employers’ preference for candidates with a foundational understanding of algorithms, system architecture, and programming principles.
- Project Management ranks high, suggesting that beyond technical proficiency, employers are seeking professionals who can also manage timelines, deliverables, and stakeholder expectations effectively.
- Business Process knowledge also stands out, pointing to demand for skills that bridge technical solutions with operational efficiency.
📌 Interpretation:
These specialized skills reflect a blend of technical expertise and operational acumen. When compared to our team’s current skills, some gaps — such as Project Management, Business Process, and Data Analysis — may represent areas of opportunity for targeted learning and future curriculum alignment.
# Define the relevant skill columns
skill_columns = ["COMMON_SKILLS_NAME"]
# Function to safely parse stringified lists
def extract_skills(row):
skills = []
for col in skill_columns:
if pd.notna(row.get(col)):
try:
skills += ast.literal_eval(row[col])
except Exception:
continue
return skills
# Apply function across rows
all_skills = df.apply(extract_skills, axis=1).explode().dropna()
# Count frequency of each skill
top_skills = Counter(all_skills).most_common(5)
top_skills_df = pd.DataFrame(top_skills, columns=["Skill", "Frequency"])
# Optional: Save to CSV
top_skills_df.to_csv("top_skills.csv", index=False)
# Apply global theme
pio.templates.default = "plotly_white"
# Plotly common skills chart with consistent layout
fig = px.bar(
top_skills_df,
x="Frequency",
y="Skill",
orientation='h',
color_discrete_sequence=["salmon"],
title="Top In-Demand COMMON SKILLS In IT"
)
fig.update_layout(
yaxis=dict(autorange="reversed"),
margin=dict(t=50, l=50, r=25, b=25),
xaxis_title="Frequency",
yaxis_title=""
)
fig.show()**
🤝 Top In-Demand Common (Soft) Skills in IT
The chart above showcases the top five most frequently listed common skills — often referred to as soft skills — based on the COMMON_SKILLS_NAME column in the job postings dataset. These are non-technical but highly valued in IT roles across all organizational levels.
🔍 Observations:
- Communication ranks highest, emphasizing the critical need for professionals who can clearly articulate ideas, collaborate effectively across teams, and convey technical findings to non-technical stakeholders.
- Management and Leadership both appear in the top five, reflecting the industry’s growing need for individuals who can not only execute tasks but also lead initiatives, manage teams, and drive strategic outcomes.
- Problem Solving is another top-listed trait, aligning with the complex, analytical nature of most IT roles where troubleshooting and innovative thinking are routine.
- Operations rounds out the list, suggesting a demand for process-oriented thinking, efficiency, and understanding of business workflows.
📌 Interpretation:
While often overlooked in technical training, these soft skills are crucial differentiators in hiring decisions. Regardless of specialization (data science, software development, or project management), interpersonal and leadership abilities significantly enhance employability. This insight reinforces the importance of combining technical expertise with strong communication and organizational skills in both curriculum design and individual development plans.
import pandas as pd
import ast
from collections import Counter
import plotly.express as px
# Define relevant skill columns and their categories
skill_sources = {
"SPECIALIZED_SKILLS_NAME": "Specialized",
"COMMON_SKILLS_NAME": "Common",
"SOFTWARE_SKILLS_NAME": "Software"
}
# Function to safely extract list-like strings
def extract_skills(df, column_name):
all_skills = []
for row in df[column_name].dropna():
try:
all_skills += ast.literal_eval(row)
except:
continue
return all_skills
# Create a DataFrame to store top skills across categories
treemap_data = []
for col, category in skill_sources.items():
skills = extract_skills(df, col)
top_5 = Counter(skills).most_common(5)
for skill, freq in top_5:
treemap_data.append({
"Skill": skill,
"Category": category,
"Frequency": freq
})
# Convert to DataFrame
treemap_df = pd.DataFrame(treemap_data)
# Plot Treemap
# Set global Plotly theme
pio.templates.default = "plotly_white"
# Treemap plot
fig = px.treemap(
treemap_df,
path=["Category", "Skill"],
values="Frequency",
color="Category",
color_discrete_map={
"Specialized": "mediumpurple",
"Common": "salmon",
"Software": "skyblue"
},
title="Top 5 In-Demand IT Skills by Category"
)
# Add margin and show legend
fig.update_layout(
margin=dict(t=50, l=25, r=25, b=25),
legend_title_text="Skill Category",
legend=dict(
traceorder="normal",
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="center",
x=0.5
)
)
fig.show()🗂️ Top 5 In-Demand IT Skills by Category – Treemap Visualization
The treemap above presents a unified visual overview of the most in-demand IT skills, categorized into three primary groups: Common, Specialized, and Software. Each rectangle’s size represents the relative frequency of that skill across job postings.
🔍 Category Breakdown:
- Common Skills (orange):
- This category dominates the overall space, with Communication and Management leading the way.
- Other prominent soft skills include Problem Solving, Leadership, and Operations, reinforcing the industry’s continued emphasis on interpersonal and leadership competencies alongside technical abilities.
- Specialized Skills (Mediumpurple):
- Data Analysis and SQL (Programming Language) are top mentions, reflecting their value in strategic decision-making and backend infrastructure.
- Skills such as Computer Science, Project Management, and Business Process suggest a need for foundational technical understanding coupled with domain-specific operational insights.
- Software Skills (Skyblue):
- These are tools and technologies actively used in IT workflows.
- SQL, Excel, and Python appear again here, reaffirming their dual presence as both specialized knowledge and hands-on tools.
- SAP Applications and Dashboard (referring to visualization platforms like Tableau and Power BI) reflect enterprise-level technical expectations.
📌 Interpretation:
This treemap effectively integrates the earlier bar charts into one cohesive view, allowing for quick comparative analysis across skill types. It visually reinforces the multi-dimensional expectations of IT professionals — who must blend communication, strategic insight, and software proficiency to meet market demands.
As we move toward building an improvement plan, this treemap helps prioritize upskilling areas based on both category-level and individual skill-level significance.
📈 3.1.3 Propose an Improvement Plan
🔹 Andrey
- Skills to Prioritize Learning: Power BI, Tableau, SAP Applications
- Courses or Resources:
- Microsoft Power BI Data Analyst
- Introduction to Tableau – UC Davis
- SAP Professional Fundamentals
- Team Collaboration Suggestion:
- Can shadow Jason and Prabu on Excel reporting dashboards.
- Organize peer-led walkthroughs with Moiz on Tableau once Moiz completes his course.
🔹 Moiz
- Skills to Prioritize Learning: Python, Power BI, SAP Applications
- Courses or Resources:
- Python for Everybody – University of Michigan
- Power BI for Beginners – Microsoft
- Becoming an SAP Professional
- Team Collaboration Suggestion:
- Pair with Jason to co-develop a dashboard and improve Power BI skills.
- Conduct weekly learning swaps with Jitvan to practice Python.
🔹 Jason
- Skills to Prioritize Learning: Power BI, SAP Applications
- Courses or Resources:
- [Microsoft Power BI for Beginners
- [SAP Technology Consultant – SAP
- Team Collaboration Suggestion:
- Lead weekly recap sessions on SQL with teammates.
- Practice SAP module navigation with Moiz and Andrey after completing the course.
🔹 Prabu
- Skills to Prioritize Learning: SAP Applications
- Courses or Resources:
- Implementing an SAP Solution
- Team Collaboration Suggestion:
- Lead a team-wide “SAP Sunday” mini hackathon once a month for hands-on SAP tasks.
- Assist others in brushing up Python and Excel via peer mentoring.
🔹 Jitvan
- Skills to Prioritize Learning: Python, Excel, Tableau
- Courses or Resources:
- Crash Course on Python – Google
- Excel Skills for Business – Macquarie University
- Data Visualization with Tableau – UC Davis
- Team Collaboration Suggestion:
- Co-present Tableau project findings with Moiz to practice and receive feedback.
- Schedule weekly “code & coffee” Python debugging sessions with the team.
🤝 Summary Collaboration Ideas
- Assign a “Skill Champion” for each domain (e.g., SQL – Jason, Tableau – Moiz) to mentor others.
- Implement weekly 30-minute peer-learning check-ins.
- Maintain a shared Notion or Google Doc for learning progress, notes, and resources.
- Host monthly demo days to present completed mini-projects using new tools learned.